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ABSTRACT ^ . 

In a variety of psychological and educational 
situations, it is desirable to be able to riiake data-b^ised evaluative 
summary statements regarding the impact of a given program. Certain 
procedures typically used in meta-analytic studies that review and 
integrate results from individual studies, such as combined tests and 
measures of effect size, are particularly well suited for program 
evaluation in certain situations. This paper describes a number of 
such situations, briefly reviews the literature on combined tests and 
effect size, and provides several illustrative numerical examples of 
their application in program evaluation. The three examples 
illustrate the practical utij-ity of using combined tests and measures 
of effect size in program evaluations in situations where data are 
available either cross-sectionally , or on successive occasions, or on 
•independent components of a larger program. The materials suggest 
that measures of effect size are clearly valuable in providing 
potential insight into the differential impact of a given program, 
information that is more obscured when relying solely on statistical 
tests. (Author/JAC) i 
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Meta-Atialytic Applicatior>;5 in Program Evaluation 

* 

Arriving at data-based summary statements regarding the effectiveness 

of a given program is of considerable potential value for interpreting the 

outcomes of psychological and educational evstluations. For example, an 

evaluator may wish to integrate the independent outcome results of a program 

implemented cross-sectionally across various age or grade levels. In anpther 

situation, an evaluator may wish to Integrate the^ results of a program 

implemented with independent samples of similar subjects over successive time 

periods, such as quarters, semesters,, or years. In -still another sitjjation, an 

evaluator max ^^^^ integrate the results of .various independent services that 

an educational or social service agency provides. These three situations will be 

referrfed to as the a) cross-sectional, b) independent samples/ similar subjects - 

successive occasions, and c) independent program components cases, 

respectively. Certain procedures, such as colftbined tests and measures of 

effect size, that are -typically used in meta-analytic studies ttf statistically 

f 

integrate the findings of a large collection of results from individual studies, 
ctre particularly well suited for program evaluation'in these situations. 

The purpose of the present paper is* to briefly review the recent literature 
on combined tests and effect size, indicate how they may be used effectively in 
program evaluation, ^nd provide several illustrative numerical' "examples of 
their actual application in prograqn evaluation. It should be understood that 
these procedures are distinct from those known as meta evaluation (Cook & 
Cruder, 1979), which denotes the evaluation of evaluations. 



Mpta- Analytic Evaluation *2 
Combining Results of Independent Tests 

Statistical methods available for combining the results of independent 
studies range from various counting procedures to a variety of summation 
procedures involving either significance levels (probabilities or their 
logarithmic transformations) or raw or weighted test statistics such as ts or zs, 
, Since R.A, Fisher (1932) and Karl Pearson (1933) independently addressed 
the issue of statistically summarizing the results of independent tests of the 
same hypothesis, interest in these types of procedures has continued;. More 
recently this ()rocess has been called meta-analysis (Glass, 1976) for "statistical 
ccnalysis of a large collection of analysis results from individual studies for the 
purpose of integrating the findings" (p, 3), For a thorough review of the 
"traditional'^ meta-analysis approach to the review and synthesis of research 
literature, the reader is referred' to Glass (1976, 1978) and Glass, McGaw, and 
Smith (1981). The present paper addresses the application of these procedures to 
program evaluation rather than to the synthesis of research literature pn-a 
given topic. 

These prociedures have becorfte known as "combined tests," and have been 
illustrated by Rosenthal (1978) and Winer (1971),'"among- others.' While a variety 
of tests for combining the results of independent tests of the same hypothesis 
have been put forward (see Birnbaum, 195^; Rosenthal, 1978; Van Zwet and 
Oosterhoff , 1967 for reviews of these tests), only the procedures presented^by 
Fisher (1932, mS), Winer (1971), and Stouffer (19^9; Mosteller & Bush, 195^) will 
be discussed in the present paper. 
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^ Meta-Analytic Evaluation 3 

In addressing the question of combining the results of a number of 
independent tests which have all bden planned to test a common hypothesis, 
Fisher described a method based on the product of probabilities from different 
trials. If the natural logarithmtpof these probabilities are calculated and then 
multiplied by minus two (-2) and then summed, a chl square with degrees of 
freedom equal to two times the number of tests combined (2n) is obtained (the 
logarithmic transformation permits a summative rather than a multiplicative 
function, thereby simplifying calculations). This may be expressed in the form 
of 

x2=-2 Z log^p, . (1) ^ • 

with df = 2n 

where n = number of tests combineoj 

and p = one tailed probability associated with each test. 

This procedure has been shown to be more efficient than several of the 
other combining methods (Koziol & Perlman 1978; Littell & Folks 1973), 
although it suffers from several limitations (Rosenthal 1978). Mosteller and 
Bush (195^) noted that it can yield results inconsistent with a simple sign test in 
situations where the majority of a large number of ^udies showed results in one 
direction with p values dose to .50 (i.e. chance). In this situation the sign test 
could easily reject the overall null hypothesis, while the Fisher procedure would 
not. The Fisher procedure woulc^thus yield more conservative results in this 
situation, a result not terribly disturbing given the recent recommendations of 
reporting the effect size as well as the overall probability level when 'using, 
combined tests (McGaw & Glass 1980; Rosenthal 1978). That is, while the sign 
test would be significant in thus instance, the effect size would likely be srflall 
and thus more appropriately tested with the Fisher method which would result 
in non-significance. 
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A more serious disadvantage of the Fisher procedure, however, is its 
support for the significance of either outcome when two studies of equally and 
strongly significant results in opposite directions are obtained (Adcock, 1960). 
Even given these limitations, this procedure remains 'one of the best known and 
applied. 

Winer (1971) has presented a procedure for combining independent .tests 
that comes directly from the sampling distribution of independent t-statistics in 
which the t-rstatistics associated with each test are summed and divided by the 
square root of the sum of the degrees of freedom (df) associated with each t 
after each df has been divided by df -2. This may be expressed in the form of 



/Z [df /(df - 2)] 

This procedure is based on df/(df -■ 2) being the variance of a t- 
distribution, which is approWmately normally distributed (N (0,I)) when df >_I0. 
Thus this procedure is not appropriate for tests based on very small samples 
(less than 10) and, as Rosenthal (1978) ^pointed out, "cannot be employed at all 
when the 5ize ctf the samples for which t is computed becomes less than three, 
because that would iijvolve dividing by zero or by a negative value." In 
practice, however, it is not common for tests of significance to be applied to 
such small samples, thereby minimizing the effect of this disadvantage; 

^A third approach originally attributed to Stouffer (19^9) is more fully 
described by^Mosteller and Bush (195^) and Rosenthal (1978). It is similar to the 
Winer procedure of summing t's, with the exception that £ values are converted 
to zs instead of to ts, an^ then summed. The denominator then simplifies to the 
square root of the number of tests combined, and the complete expression takes 
the form of 

z= £ z , (3) 

/If > 
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wfiere N= number of tests combined. This procedure is based on the sum of 
normal deviates, being itself a normal deviate, with the variance equal to the 
number of observations summed. 

The Stouffer procedure ^^rs several advantages. The calculations are 
more straightforward than both the Fisher procedure, which necessitates 
logarithmic transformations, and the Winer procedure, which makes an 
adjustment for degrees of freedom. In addition, results of the 2 procedure, 
while slightly more powerful, are virtually identical to results of the t 
procecjure (Wolf & Spies, 1981). This is particularly true when the statistics 
summed are derived from large samples, as df/df-2 approaches unity as sample 
size increases. 

Measuring Effect Size 

Glass' exposition and application of meta-an^ysis relies heavily on the use 
of measures of effect size that have -been eloquently summarized by Cohen 
(1977). pohen states, "Without intending any necessary implication of causality, 
it is convenient to use the phrase 'effect size' to mean 'the degree to which the 
phenomenon is present in the population', or 'the degree to which tfie null 
hypothesis is false'. Whatever the manner of representation of a phenomenon in 
a particular research in the present treatment, the null hypothesis always 
means that, the effect size is zero" (pp. 9-10). 

Statistical tests such as. the combined procedures previously described 
provide a summary index of the statistical significance of the results pertaining 
to an hypothesis. They do not, however, provide any insight into the strength of 
the relationship 'or effect of interest. The desirability of accompanying 
combined tests with indexes of effect size has been noted by Rosenthal (1978). 
McGaw and Glass 0980) and Glass, McGaw, and Smith 0981) provide helpful 
guidelines for converting various summary statistics into a common metric, 
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usually in the form of the Pearson Product Moment Correlation. Cohen (1977) 
provides measures of effect size for most common statistical tests. Because 
many program evaluations consist of pre-post and/or experimental-control 
gfoup designs, the effect size measures for t-tests between means will be 
illustrated here. The reader is referred to the above references for mea^sures of 
effect size appropriate for other statistical tests. 

The ^oal ' is to obtain "a pure number, one jEree of our original 
measurement unit, with which to index what can be alternatively called, the 
degree of departure from, the tiull hypothesis of the alternative hypothesis, or 
the ES (effect size) we wish to detect. This is accomplished by standardizing 
the raw effect size as expressed in the measurement unit of the^- dependent 
variable by dividing it by the <common) standard deviation of the measures in 
their respective populations, the latter also in the original measurement" 
(Cohen, 1977, p. 20). 

This may be accomplished in the form of 

# a 

where d = ES index for t-tests of means in standard unit, 3c ^ aad = sample 
means in original measurement units, and <^ =standard deviation of either 
sample (as homogeneity o'^f variance is assumed). 

The means, Xj and X2 ,.are typically the experimental and control group 
means in posttest-only control group experimental designs, or pre and post 
means in one group pretest-posttest pre-experimental designs. It should be 
noted that the latter design may be. considered "primitive yet adequate if 'the 
treated group members' pretreatment status is^ a good estimate of their 
hypothetical post-treatment status in the absence of treatment" (Gl^iss, 1978). 
This is an empirical question that can be studied to determine if maturation. 
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* 

pre-test sensitizatwn or other threats to the validity of this design have in fact 
biased this estimate. In fact, Campbell (1982) has recently indicated that the one 
group, pretest-postjest design has"nowbeen elevated to a useful quasi-experimental 
or proto-experiment^l design" in the planned revision of his classic work on 
research design (Campbell & Stanley, 1963)* 

The standard deviation, ^ , is typically either the control group or pretest 
standard deviation, as it is assumed that the two group variances are equal. 
Another possibility would be to use the within population standard deviation. 

Once the effect 5ize, d, is-determined, Cohen provides tables to translate 
d into measures of nonoverlap (U) between the two groups, which translate 
rather nicely into graphical displays which facilitate interpretation of the 
results. Perhaps the most useful index of nonoverlap is Cohen's U3, which 
translates average performance in percentiles (area' under the normal curve) of 
the experimental (or posttest) group to the equivalent percentile of the control 
(or pretest) group. This will be illuminated with the following numerical 
illustrations. 

Some Illustrative Examples 
The follov^ing numerical examples provide concrete illustration of these 
computational methods. To consolidate the various examples, all three 
illustrations use one group pretest-posttest designs, as these were the designs of 
the, actual programs evaluated. Obviously, the computations would be the same 
if a posttest-only control groups experimental design had been used, with the 
control group mean replacing the pretest mean and the experimental group 
mean replacing the posttest mean. In this instance the cbntrol group standard 
deviation would be usedinstead of the pretest standard deviation. 
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A decision rule that will be employed throughout is to use no significance 
level less th^an .001, two-tailed or .0005, one-tailed. This convention leads to a 
more conservative result when g values rather than the raw test statistics are 
used. In addition, it should be noted that one-tailed tests are used with 
combined tests (Fisher, 1932; Rosenthal, 1978, 1980; Winer, 1971), inasmuch as 
• the results of the prior independent studies are known and the direction of the 
hypothesis should therefore be clear. 

Case A; Cross-S^tional Synthesis 

Alternate forms of the Comprehensive Tests of Basic Skills (CTB/McGraw 
Hill, 1973/1975) were administered under standard conditions in October at the 
beginning of the schoql year and again in May at the close of the year to 2,630 

V 

M 

Students in Grades 1 to 8 from all three elementary and both middle schools in a 
rural midwestern community of approximately 20,000 inhabitants. The CTBS 
mathematics subscales were used as part of the evaluation of a federaHy funded 
mathematics program (Wolf & Blixt, 1979, 1981). A one group pretest-posttest 
pre-experi mental design was used to assess the 'change in mathematics 
achievement at each grade level. Results of paired t-tests summarized in Table 
1 indicated that students at each grade level exhibited significant ( g < .001, 
two-tailed tests) improvement at each grade level (paired t = 1^.17 to ^3.^2). 



Insert Table 1 about here 



Combining the results of all 8 of these independent tests of the research 
hypothesis (Table 2) that students would exhibit significant gains in their 
mathematics achievement in order to make one summary statement by applying 
the Fisher procedure described in formula 1 would result in: 

= 15.2 + 15.2 + 15.2 + 15.2 + 15.2 + 15.2 + 15.2 +15.2 = 121.6 (5) 
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Because there are eight independent tests of this hypothesis, one for each grade 
level, there are 2h or (2) (8) = 16 degrees of freedom. The critical value for an 
alpha level of .001 with 16 df, one-tailed is 39.25.. Not surprisingly, the 
combined evidence from the eight tests indicates that the research* hypothesis 
of significant ^ains in achievement is supported when the scope of the inference 
is with respect to the combined populations. 



Insert Table 2 about here 



Similarly, when applying formula 2 for the Winer procedure to the same 
data, the following result is obtained: 

z= »3.32 + 35.»7 +36.11 + 2».39 + 19.2» +17.30 +18.0» + R.17 (6) 

^/ 308 + 361 + 3« + 339 + 330 + 303 + 321 + 2^ 
V 306 359 360 337 328 5oT 319 296 

= 208.0» = 73.25 
2.84 

The probability of obtaining this value of z or one larger is p (z 1 73.25) 
< .001, one-tailed. 

Analogous results are also obtained when formula 3 for the Stouffer 
procedure is applied to the data. In this afJproach, however, the one-tailed g 
values are converted to their analogous z - statistics and then summed a^ 
divided by the square root of the number of tests summed: 



z = 3.3 + 3.3 + 3.3 + 3.3 + 3.3 + 3.3 + 3.3 + 3.3 = 26^ = 9.33 (7) 

^ -2.83 



The probability of obtaining this value of z or larger is g (z i 9.35) < .001, 
one^tailed. Because a decision rule not to use g values less than .0005, one- 
tailed was Implimented, it is noteworthy that when these g values are converted 



Mqta-Analytic Evaluation 10 

to z-statistics, the resLiltant z-statlstics are markedly lower than the t- 
statistics obtained from the original* raw, d?ita» However, the overall result is 
equivalent. 

Given that the differences between the pre and postte?E means were 
highly significant at each grade level, it is hardly surprising that the overall 
combined test is also highly significant. In this instance*, the magnitude^of the 
effect may be of more practical importance and interest. Applying the effect 
size formula for.d in equation ^ to*^he data for students in the iirst grade 
provides the following result: 

d = \o.9-2A\ = 1.5 =2.»2 (8) ' " 

0.62 0.62 

Cohen (1977) provides Interpretative guidelines for effect size, with d =^ .2 
Indicative of a small effect, d = .5 Indicative of a medium effect, and d = .8 
indicative of a large effect. Clearly the effect for first graders in the example 
is a large one. Another way of conveying the same conclusion is to translate d 
into a measure -of overlap (U); Cohen (1^977) provides tables for making this 
transition, although values obtained from a normal distribution -table are 
essentially equivalent to Cohen's tabled values. A d value of 2A2 translates, 
into a U3 value of .992. This means that the average score (50th percentile) on 
the posttest was equivalent to the 99.2nd percentile on the pretest. 

The effect size would be calculated in a similar fashion for each qf the 
other seven grade levels. These individual effect sizes typically are averaged 
to obtain the me^n effect size over all grade levels, which in the present 
instance was U9. This average effect size translates- into a^U^ value of .838. 
Thus across all eight grade levels wg could expect the aj^erage performance on 
the posttest to be equivalent to the 83.?th percentile on the pretest. This is 
presented graphically in figure 1. - 

/ . ■ ■ ' 

- ' 12 
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Insert Figure 1 about here 



. Interestingly, however, effect sizes ranged from a high of 2A2 Qarge 
effect)4apHfirst graders to a low of 0.52 (medium effect) for eighth graders, 
with a generally downward trend with increasing age (grade level). An 
examination of means and standard deviations for individual grade • levels * 
suggests that this decreasing trend is a result, in part, of the increasing 
variance associated with increasing grade levels. This could perhaps suggest 
that individual differences in mathematics , achievement are relatively 
homogeneous at the beginning of formal education, but become much more 
pronounced with greater educational experience. This in turn suggests that the 
program on the average had greatest impact in the earlier grades, even though 
th e impact was quite noticeable throughout. 

' It is notod; however, that these interpretations are very speculative given 
the nature of pre-post designs. That is, threats to the validity of these results 
through maturation and nornral academic progress effects (not resulting from 
this specific treatment program) are uncontrolled in this design. A more 
^ appropriate evaluation design would be to compare the performance of each 
grade after it had the program with that of the same grade for the previous 
year, which didn't* This would then confound the program treatment effects 
with only historic and cohort differences. The same combined test and effect 
size procedures could then be performed on this non-equivalent control group 
design as were presented -here*^ The present example was presefit^ only for 
'illustrative purposes. ^ " Y 
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Case B; Independent Samples/Similar Subjects-Successive 
Occasions Synthesis 

First-year medical students participated in a 10 week course designed ta 
improve their communication and interviewing skills (Engler, Saltzman, Walker 
& Wolf, 1981; Saltzman, Wolf, S^vickas & Walker, 1981; Wolf, 1981). A Standard 
Index of Communication (Carkhuff, 1969) was administered both before and 
after training in a one sample pretest-posttest design. TheN first three 
successive classes of students each exhibited significant gains on this Index, 

s 

which fates students' responses to a series of patient situations/statements. 
While it is important to monitor each, class' performance independently, 
summarizing the results across all samples of similar subjects who participated 
in the program during successive academic years provides a more stable 
estimate of the effectiveness of training. ^ 

All tRree classes exhibited signlJicant gains (paired t = -8.55 to -2^.18; 
e <.001, two-taUed). The Fisher (X^(6) = ^5.6), Winer (z = 26.^3), and Stouffer. 
(z = 5.72) combined tests were each highly significant ( g <.001, one-tailed). 
The average effect size was 2.90 (Sd = 0.7$ indicating that the average 
performance at posttesting was equivalent to the 99.8th pej-centile on the* 
pretest. These findings are summarized in tables 3 and ^. 



Insert Tables 3 and ^ about here 



Case C; Synthesizing Independent Program Components 

Gjbal attainment scaling (Kiresuk & Lund, 1976) was used to evaluate the 

I ^ 
impact of services provided by four independent agencies (Adult Mental Health 
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Center, Elderly Home Aid Services, Crisis Interv.ention/Hotline, and Children's 
Services) that comprise'a county mental health board (Wolf & Blixt, 1981). Goal 
attainment follow-up guides were completed at intake and again during follow- 
up 10 weeks later* Paired t-tests indicated that on the average clients in each 
of the agencies exhibited significant improvement (paired t = 10.28 to 12.02; g 
< ,001, two-tailed). Combined tests used to synthesize and summarize these 
independent results confirmed ( g .001, one-tailed) these findings with respect 
% the combined populations (Fisher = 60.8; Winer z = 20.91; Stouffer z = 
*6,60). The average effect size^ of 3J9 indicated that average foUow-up 
scores were equivalent to scores at the 99.9th percentile on tfie distribution of 
scores at intake. These findings are summaried in more detail in tables 5 and 6. 



Insert Tables^5 and 6 here 
Conclusions and Recommendations 



'The above examples illustrate the practical utility of using combined tests 
and measures of effect size in program evaluation in situations where data are 
available either cross-sectionally, or on successive occasions, or on independent 
components of a larger program • It is suggested that a combined test and 
measure of effect size both be used rathfer than presenting one without the 
other. The d^oice of a c!ombined test may rest on several *f actors, such 2i3 the 
information available (e.g, onl^/^^alues may be available in some instances), 
ease of computation, or the desire for consistency l^tween the combined test 
selected and the statistic used for the independent tests (e.g. the Winer 
procedure would be more consistent with summing independent t-statistics. 
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while the Stouffer procedure^ would ^be more consistent with summing 
independent z-statistics). MeasiJes of effect size are clearly valuable in 
providing potential 'insight into the differential impact of a given program, 
information that generally is more obscured when relying solely upon statistical 
tests. 



J 
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Table 1 



Means, Standard Deviations, and Paired t-Tests 
for Student Performance on the 
CTBS Mathematics Achievement Test 



ERLC 



Pre Post ' Paired 

t 



Grade 


n 


M 


Sd 


M 


Sd 


1 


309 


0.9 


.62 


2A 


.51 


2 


" 362 


2.1 


.6if 


3.1 


.72 


3 


363 


3.1 


.75 


4.6 


1.21 


4 


340 


i^A 


1.33 


5.6 


1.52 


5 


* 331 


5A 


1.55 • 


6.7 


2.08 


6 




6.2 


1.93 


7.5 


2.21 


7 


322. 


7.2 


2.09 


8.5 


2.49 


S 


299 


8.2 


2.29 


9.4. 


2.47 



* g < •001, two-tailed test 



2u 



43.32* 
35.47* 
36.11* 
24.39* 
19.24* 
17.30* 
18.04* 
14.17* 



Meta-Analytic Evaluation 19 



Table 2 



Results of Paired t-Tests for Student Performance 
on the CTBS Mathernatics^Achievement Test 



« 






• 








irade 




Paired 
t- 


One-tailed 
P 




d 


99.6 


1 


309 


*3.32- 


.0005 


15.20 


2.*2 


2 


362 


35.*7 


.0005 


15.20 


1.56 


* 9*.l 


3 


363 


36J1 


.0005 


15.20 


2.00 


97.7 


4 


3*0 


2^.39 


.0005 - 


15.20 


0.90 


81.6 


5 


331 


• 19.2* . 


.0005 


15.20 


0.8* 


79.5 


6 


30* 


. 17.30 


.0005 ' 


15.20 


0.67 


7*.9 


7 


322 


18.0* 


.0005 


15.20 


0.62 


73.2 


S 


299 


r*J7 


.0005 


15.20 


0.52 


69.^ 


Average: 




* 




> 

• 


1.19 
t 


83.8 
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Tabl^ 3 



Means, Standard Deviations, and Paired 
t-Test5 for First-year Medical Student Performance 
on Carkhuff Standard Index of Communication 



Graduation 
Year 


n 


M 


Pre 

Sd 


M 


Post 

Sd 


Paired 
t 


1981 


46 ' 


1.55 


.30 


2.60 


" .22 \ 


-2k\\%* 


1982 




1.32 


. .39 


2.5* 


.*8 


-1*.16* 


1983 


k2 


l.*7 


.52 


^.55 


.59 


-8.55* 



* g < .001, two-tailed test 
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Table it 



Results of Paired t-Tests for First-year 
Medical Student Performance on Carkhuff 
Standard Index of Communication 



Graduation 
Year . 



n 



Paired 
t 



One-tailed 
P 



-2 logg p 



U3(%) 



1981 
1982 
1983 



-Average 



46 
42 



-24a8 
-14.16 
-8.55 



.0005 
.0005 
.0005 



15.20 
15.20 
15.20 



3.52 
3'. 12 
2.07 

2.90 



99.9 
99.9 
98.0 



99.8 
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Table 5 



Goal Attainment Scaling Evaluation Results 

for County Mental Health Agency Services v 







Intake 


Follow-up 


Paired 


Service • 


V n 


M 


Sd 


M 


Sd 


t 


Adult Mental Health Services 


20 


37.62 


3.95 


5ZM 


6.83 


12.02 


Elderly Home Aid Services 


19 






53.93 


8.51 


10.28 


Crisis Intervention/Hotline 


20 


2ZM 


k.25 




6.61 


10.57 


Children's Services 

/ 


31 


38.13 


10.52 


yj.n 


9.66 


11.15 


* g < .001, two-tailed test 


• 






• 






1 


/ 












• 












• 




/ 


'^V - , i 











ft 
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Table 6 



, Results of Paired t-Tests for Goal 
Attainment Scaling Evaluations of 
County Mental Health Agency Services 



■ Service 


n 


Paired 
t' 


One-tailed 
P 


-21og^ p 


d 


*U3(%) 


Adult Mental Health Services 


20v 


12.02 


.0005 


15.20 


5.17 


99.9 


Elderly Home,Aid Services 


19 


10.28 


.0005 • 


15.20 


3.99 


99.9 


Crisis Intervention/Hotline 


20 


10.57 


.0005 


15.20 


4.11 


99.| 


Children's Services 


31 


11.15 - 


.0005 


15.20 


1.87 


96.9 


Average: 










3.79 


99.9 



ERIC 
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Figure"!. Illustration of average effect size in standard deviation units 
( o x) of student 45erf ormance on the CTBS Matheniatics Achievement Test 



for grades 1-8. 
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